MLLM Post-Training(RL, SFT) Codebase选择

1 评价指标

代码易读性、易修改性

bug多不多,开发者提供的支持怎么样(这方面来说大厂的框架是不是比较好)

模型和算法支持

并行策略

加速手段

2 现有框架

TRL - Transformer Reinforcement Learning,支持sft,轻量的框架,就是hugginface Transformers Trainer的简单wrapper

GitHub - modelscope/ms-swift,阿里搞得,支持的模型特别多,微信群里甚至周末都会回复,支持NPU

GitHub - InternLM/xtuner: An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

GitHub - OpenRLHF/OpenRLHF-M: An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.,多模态版本的OpenRLHF,也支持SFT

GitHub - volcengine/verl: verl: Volcano Engine Reinforcement Learning for LLMs

2.1 暂时Pass掉的

GitHub - hiyouga/LLaMA-Factory: Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024) 支持NPU,北航phd搞得,但是微信群里回答的不多

GitHub - pytorch/torchtune: PyTorch native post-training library,感觉很不错,可惜不支持多模态(除了他们自家的llama3v),感觉支持的各种feature也确实很少。

GitHub - volcengine/veScale: A PyTorch Native LLM Training Framework,字节的,不更新了。

2.2 其他

GitHub - rhymes-ai/Aria: Codebase for Aria - an Open Multimodal Native MoE,还不错的codebase

GitHub - TencentARC/mllm-npu: mllm-npu: training multimodal large language models on Ascend NPUs NPU Training,很久没更新了

GitHub - Yangyi-Chen/SOLO: [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling",有使用Megatron的训练code。

GitHub - yujunhuics/ReyesInternViT + MLP + Qwen2.5,轻量级框架。